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An  Efficient  Correspondence  Based  Algorithm  for 
2D  and  3D  Model  Based  Recognition 

Thomas  M.  Breuel 

Abstract 

This  paper  presents  a  polynomial  time  algorithm  (pruned  correspondence  search.  PCS)  with 
good  average  case  performance  for  solving  a  wide  class  of  geometric  maximal  matching  problems, 
including  the  problem  of  recognizing  3D  objects  from  a  single  2D  image.  Given  two  finite  sets 
of  geometric  features  with  error  bounds  and  a  polynomial  time  algorithm  that  determines  the 
feasibility  of  individual  matchings,  it  finds  a  maximal  matching.  The  algorithm  is  based  on  a 
pruned  depth-first  search  for  correspondences.  Pruning  is  accomplished  by  representing  regions 
of  search  space  that  have  already  been  explored  using  an  "adjoint  list”  of  correspondences 
between  image  and  model  points. 

The  PCS  algorithm  is  connected  with  the  geometry  of  the  underlying  recognition  problem  only 
through  calls  to  a  verification  algorithm.  The  analysis  of  ♦  he  P^'S  algorithm  demonstrates 
clearly  the  effects  of  the  various  combinatorial  and  geometric  constraints  on  the  complexity  of 
the  recognition  problem. 

Efficient  verification  algorithms,  based  on  a  linear  representation  of  location  constraints,  are 
given  for  the  case  of  affine  transformations  among  vector  spaces  and  for  the  case  of  rigid  2D  and 
3D  transformations  with  scale. 

Among  the  known  algorithms  that  solve  the  bounded  error  recognition  problem  exactly  and 
completely,  the  PCS  algorithm  currently  has  the  lowest  complexity.  Some  preliminary  experi¬ 
ments  suggest  that  PCS  is  a  practical  algorithm.  Its  similarity  to  existing  correspondence  based 
algorithms  means  that  a  number  of  existing  techniques  for  speedup  can  be  incorporated  into 
PCS  to  improve  its  performance 
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1  Introduction 


This  paper  is  concerned  with  an  efficient  algorithm  for  geometric  object  recognition  from 
visual  data.  We  will  study  geometric  recognition  under  under  a  bounded  error  model: 
assume  we  are  given  a  set  of  image  points  B  (say,  in  IR2)  and  a  set  of  model  points  M 
(say,  in  IR3);  determine  whether  there  exists  a  viewing  transformation  T  that  will  map 
each  model  point  to  within  a  distance  e  of  an  image  point.  A  more  general  version  of  this 
problem  is  to  determine  the  size  of  the  largest  subsets  ot  image  and  model  points  that  can 
be  brought  into  correspondence  and  the  transformation  that  does  this.  This  is  illustrated 
in  Figure  1. 

There  are  several  reasons  why  bounded  error  models  are  interesting  formalizations  of 
geometric  recognition  and  estimation  problems.  Often,  bounded  error  models  answer 
statistical  questions  more  directly  than  other  statistical  methods-in  many  aplications. 
we  are  only  concerned  with  the  question  whether  some  particular  value  does  not  deviate 
from  a  true  value  by  more  than  some  given  bound;  the  exact  distribution  of  the  error 
within  those  bounds  is  of  no  concern.  Bounded  error  models  also  tend  to  be  more  robust 
and  easier  to  apply  than  estimation  techniques  based  on  more  specific  models  of  error. 
Finally,  bounded  error  models  allow  us  to  carry  out  a  formal  analysis  of  the  computational 
complexity  of  recognition  algorithms;  such  an  analysis  can  give  us  important  clues  about 
the  complexity  and  behavior  of  other  recognition  algorithms,  such  as  those  based  on  least 
square  methods  or  alignment. 

Bounded  error  models  have  received  considerable  attention  in  recent  years.  Their  study 
in  visual  recognition  (e.g.,  Grimson  and  Lozano-Perez,  19S3,  Baird,  1985,  Breuel,  1989, 
Breuel,  1990,  Cass,  1990)  has  been  preceded  by  bounded  error  models  in  control  (see 
Norton,  1986,  for  an  introduction),  which  ultimately  culminated  in  the  development  of 
a  polynomial  time  algorithm  for  the  linear  programming  problem  Khachian,  1979  (see 
Papadimitriou  and  Steiglitz,  1982,  for  a  simple  introduction). 

Baird,  1985,  has  analyzed  the  the  problem  of  2D  visual  object  recognition  under  a  bounded 
error  model.  He  showed  that  determining  a  transformation  given  a  1-1  correspondence 
(matching)  between  a  subset  of  image  points  and  a  subset  of  model  points  subject  to 
convex  polygonal  error  bounds  can  be  solved  efficiently  by  using  a  linear  programming 
algorithm.  He  then  used  a  simple  depth-first  search  algorithm  to  search  for  the  maximal 
1-1  correspondence  and  gave  an  average  case  analysis  of  the  complexity  of  the  search  for 
random  point  patterns  in  the  absence  of  spurious  data  or  occlusions  and  assuming  that  a 
match  actually  exists.  His  algorithm  empirically  performs  well  even  if  a  few  model  features 
are  missing  from  the  image  and  a  few  spurious  features  have  been  added  to  the  image,  but 
he  could  not  bound  the  worst  case  running  time  of  the  algorithm  with  a  polynomial.  The 
chief  advantage  of  his  search  technique  is,  however,  that  it  extends  in  a  straightforward 


manner  to  higher  dimensions.  The  search  technique  proposed  by  Crimson  and  Lozano- 
Perez.  19S3.  is  similar  to  Baird's  method:  they  have  proposed  pruning  heuristics  that 
improve  the  performance  of  their  search  algorithm  in  practice.  But  such  heuristics  have 
not  yielded  a  provably  polynomial  time  recognition  algorithm  (Crimson.  1988.  Crimson. 

1939). 

Cass.  1990.  has  used  a  different  approach  to  the  same  problem.  He  observes  that  the  space 
of  transformations  is  partitioned  into  a  polynomial  number  of  cells  by  correspondences 
between  model  and  image  features.  The  search  for  maximal  1-1  correspondences  can  then 
be  restricted  to  examining  the  boundaries  of  these  cells  of  transformation  space.  By  using 
some  additional  constraints.  Cass  reduces  the  problem  to  a  ID  problem,  which  means 
that  the  boundaries  separating  the  cells  are  a  (provably  small)  set  of  discrete  points.  This 
"topological’’  method  has  the  advantage  that  it  is  easily  demonstrated  to  require  only  a 
polynomial  number  of  arithmetic  operations  in  the  2D  case.  Cass'  method  is  essentially 
a  sweep  of  transformation  space1  (e.g.,  Edeisbrunner,  19S7). 

A  very  important  contribution  of  Baird.  1985,  was  the  use  of  a  representation  of  transfor¬ 
mations  that  ensured  that  linear  constraints  on  the  location  of  image  features  gave  rise 
to  linear  constraints  on  the  set  of  feasible  transformations.  Baird’s  linear  representation 
lets  us  apply  linear  programming  methods  to  the  verification  problem. 

Based  on  this  correspondence  between  linear  constraints  on  location  and  halfspaces  (linear 
constraints)  in  transformation  space,  the  existence  of  a  polynomial  time  algorithm  for 
a  wide  class  of  recognition  problems  is  easy  to  see:  the  linear  constraints  on  feature 
locations  will  give  rise  to  a  partition  of  transformation  space  into  a  polynomial  number 
of  cells.  Standard  algorithms  from  computational  geometry  can  be  used  for  enumerating 
these  cells.  Each  individual  cell  will  correspond  to  a  number  of  geometrically  equivalent 
matchings  between  image  and  model  points  that  can  be  determined  by  explicitly  applying 
one  of  the  transformations  in  the  cell  to  the  model  and  comparing  the  result  with  the 
image. 

The  performance  of  transformation  space  sweep  algorithms  in  the  average  case  is.  however, 
disappointing:  many  cells  enumerated  by  the  transformation  space  sweep  correspond 
only  to  small  matchings.  In  practice,  correspondence  search  algorithms  can  perform 
significantly  better,  even  though  the  worst  case  complexity  of  previous  correspondence 
search  algorithms  is  provably  exponential.  The  empirical  efficiency  of  correspondence 
search  algorithms  is  a  consequence  of  the  fact  that  they  can  use  geometric  constraints  to 
eliminate  most  of  the  smaller  geometrically  inconsistent  matchings. 

This  paper  first  describes  the  pruned  correspondence  search  (PCS)  algorithm,  an  algo- 

•  Son*.  *  hat  similar  in  spirit  to  Cass.  1990.  but  more  restricted  in  scope,  is  the  result  of  Alt  1 1  al..  1988. 
which  applies  only  to  the  problem  of  proving  the  existence  of  a  bounded  matching  between  two  sets  of 
points  under  uniform,  circular  error  bounds 
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Figure  1:  A  formalization  of  the  recognition  problem  with  bounded  error:  Find  the  largest 
subset  of  points  mt  on  the  left  such  that  there  exists  a  transformation  T  (translation, 
rotation,  scale)'  of  the  plane  that  maps  the  points  into  the  error  bounds  Bj  —  bj  +  £, 
given  by  image  points  b:  together  with  error  bounds  given  as  sets  Ej  on  the  right. 


rithm  that  combines  the  advantages  of  worst  case  polynomial  time  complexity  with  the 
good  average  case  behavior  of  previous  correspondence  algorithms.  The  PCS  algorithm 
consists  of  a  polynomial  time  transformation  of  a  verification  algorithm  into  a  recognition 
algorithm.  The  central  idea  underlying  the  PCS  algorithm  is  that  regions  of  transforma¬ 
tion  space  that  have  already  been  explored  can  be  represented  efficiently  and  concisely 
using  a  list  of  pairings  between  image  and  model  points. 

In  the  second  part,  the  paper  describes  how  to  apply  the  PCS  algorithm  to  the  problem 
of  2D  and  3D  model  based  recognition.  In  order  to  do  so.  we  extend  the  linear  formula¬ 
tion  introduced  by  Baird  to  the  non-linear  problem  of  verifying  3D  objects  in  2D  images 
under  rigid  transformations.  Then,  worst  case  and  average  case  complexities  for  the  al¬ 
gorithm  applied  to  the  problems  of  2D  and  3D  recognition  from  2D  images  are  derived. 
It  is  also  discussed  how  the  PCS  algorithm  can  be  integrated  into  any  existing  correspon¬ 
dence  search  based  algorithm  with  little  additional  overhead,  taking  advantage  of  existing 
heuristic  methods  while  guaranteeing  worst  case  polynomial  time  complexity  even  in  the 
presence  of  large  error  bounds,  clutter,  and  occlusions. 
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2  The  Algorithm 


Let  us  begin  by  stating  the  formal  problem  that  the  algorithm  presented  in  this  paper 
solves  (following  closely  the  definitions  in  Baird.  19S5): 

Definition  1  Given  a  set  oj  image  points  B  =  . b^}.  a  set  oj  model  points 

,\4  =  {m\ . m/},  a  set  of  error  bounds  £  —  {Ei, . . . ,  Ek}  and  a  set  of  permissible 

tra nsformations  T.  a  MATCHING2  consists  of  two  subsets  C  B  and  C  M.  and  a 
permutation  p.  .4  matching  is  FEASIBLE  if  there  exists  some  transformation  T  €  T  such 
that 3  T  b,  €  mM(,)  +  E,.  for  all  bt  € 

The  SIZE  of  a  matching  p  ( written  as  siz e(p))  is  the  size  of  B*1  (or.  equivalently.  W 

.4  matching  can  be  viewed  as  a  collection  of  PAIRINGS  or  CORRESPONDENCES,  where  a 
pairing  is  a  pair  consisting  of  an  image  point  and  a  set  from  the  collection  of  model  sets: 
in  different  words,  a  pairing  is  an  element  of  B  x  M. 

Sometimes  we  will  call  a  pairing  a  CONSTRAINT  if  we  think  of  it  as  restricting  the  set  of 
compatible  transformations  T. 

Definition  2  The  problem  of  verifying  a  bounded  matching  (VBM)  is  to  determine 
whether,  given  a  matching,  there  exists  a  transformation  compatible  with  that  matching. 

Definition  3  The  Maximal  Bounded  Matching  (MBXl)  problem  is  to  find  a  feasible 
matching  such  that  no  other  feasible  matching  is  larger. 

In  the  definition  of  MBM,  it  is  sufficient  to  exhibit  any  one  of  these  maximal  matchings,  as 
long  as  no  other  matching  is  larger.  More  generally,  we  might  wish  the  algorithm  to  return 
a  concise  description  of  all  maximal  matchings;  the  algorithm  presented  in  this  paper  is 
capable  of  doing  this.  In  fact,  the  algorithm  will  return  a  small  set  of  lepresentative 
maximal  matchings,  which  could  be  used  to  reconstruct  each  possible  maximal  matchings 
efficiently. 

An  alternative  definition  places  the  error  bounds  on  the  model  points.  We  will  return 
briefly  to  this  distinction  later;  for  the  time  being  it  is  irrelevant,  since  we  only  need  to 
make  the  following  assumptions  about  the  bounded  matching  problem  at  hand. 

-The  definition  of  a  matching"  follows  Baird.  1985:  it  is  not  identical  to  the  definition  of  a  match¬ 
ing  in  graph  theory.  There  are  several  terminologies  in  common  use  for  talking  about  correspondence 
based  recogruion  algorithms.  In  this  paper,  we  will  use  the  term  “correspondence'  when  discussing 
classes  of  algorithms,  and  refer  to  individual  correspondences  as  “pairings"  and  sets  of  correspondences 
as  'matchings' 

’The  notation  v  +  E.  where  v  is  a  vector  and  E  is  a  set,  refers  to  the  set  r  +  E  -  { r  r  :  e  6  E). 


We  assume  that  there  is  a  polynomial  time  algorithm,  say.  VBME  /  VERIFICATION  of  a 

Bounded  Matching  with  Exclusion),  that,  given  two  sets  of  pairings  P  =  {/>, . pn } 

and  A  =  {a( . an }  (as  in  Definition  1).  can  determine  whether  there  exists  a  transfor¬ 

mation  T  that  is  consistent  with  all  the  pairings  in  P  and  consistent  with  none  of  the 
pairings  in  A.  For  example,  in  the  case  of  convex  polygonal  error  bounds  on  2D  points, 
this  can  be  done  using  linear  programming,  as  we  will  see.  Xote  that  VBME  may.  in 
general,  be  a  harder  problem  than  VBM:  for  example,  if  the  constraints  imposed  by  a 
pairing  on  the  s°t  of  feasible  transformations  are  convex.  VBM  has  to  determine  whether 
the  intersection  of  a  number  of  convex  sets  is  empty,  whereas  VBME  has  to  determine 
whether  the  intersection  of  a  number  of  convex  sets  with  one  non-convex  set  is  non-empty. 

The  combinatorial  constraint  that  correspondence  based  algorithms  rely  on  is  the  obser¬ 
vation  that  if  some  matching  P  is  infeasible  (geometrically  inconsistent),  no  matching 
P'  P  can  be  feasible.  A  natural  approach  to  finding  a  maximal  matching  is  therefore 
a  depth  first  search:  we  start  with  an  empty  matching,  and  add  pairings  to  it  until  the 
current  matching  becomes  infeasible.  We  call  a  matching  to  which  no  o^her  pairing  can  be 
added  without  making  it  infeasible  a  LEAF.  The  set  of  all  leaves  forms  a  set  of  candidates 
for  maximal  matchings. 

Just  terminating  the  search  when  a  matching  has  become  infeasible  is.  however,  insufficient 
to  guarantee  polynomial  time  complexity  of  the  search  algorithm.  The  reason  is  that 
geometrically  equivalent  matchings  may  be  re-explored  by  the  search  algorithm  a  large 
number  of  times.  Consider  the  extreme  example  of  matching  n  image  points  at  the  origin 
against  n  model  points  at  the  origin:  in  this  case,  there  exist  n!  different  (as  sets)  maximal 
matchings,  and  all  of  these  will  be  explored  as  leaves  by  a  simple  correspondence  search 
algorithm. 

However,  knowing  a  single  representative  of  this  set  of  geometrically  equivalent  matchings 
is  sufficient  to  recover  all  the  other  matchings  by  alignment,  as  follows.  Given  the  rep¬ 
resentative  matching  /*.  we  compute  a  transformation  T  consistent  with  it.  We  can  then 
use  T  to  transform  the  model  into  the  image  and  test  whether  any  given  other  matching 
p'  is  equivalent  to  p  directly. 

It  is  therefore  sufficient  to  return  only  a  single  representative  for  a  set  of  geometrically 
equivalent  matchings.  From  the  above  argument,  we  can  also  deduce  what  technique  we 
can  use  in  order  to  make  sure  that  we  only  compute  one  of  the  possibly  exponentially  many 
equivalent  matchings:  all  geometrically  equivalent  matchings  correspond  to  the  same  set 
of  transformations  between  image  and  model  space.  If  we  can  keep  track  during  the  search 
of  correspondence  which  regions  of  transformation  space  have  already  been  explored,  we 
can  avoid  the  exponential  behavior  of  the  simple  correspondence  search  algorithms. 

The,  crucial  idea  presented  in  this  paper  is  that  the  regions  that  have  already  been  explored 
by  the  search  algorithm  can  be  represented  concisely  and  efficiently  as  another  set  of 
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pairing s.  the  ADJOINT  PAIRINGS.  By  putting  a  pairing  of  a  given  image  point  with  a  given 
model  point  on  the  adjoint  list,  we  ensure  that  no  transformations  that  map  the  model 
point  to  within  the  given  error  bounds  of  the  image  point  will  ever  be  reconsidered  by  the 
algorithm.  This  fact  is  the  main  motivation  behind  the  PCS  algorithm. 

An  algorithm  based  on  these  ideas  is  given  in  the  Scheme  programming  language  in'1 
Figure  3.  The  algorithm  is  implemented  bv  the  function  all-representatives.  This 
function  generates  a  list  of  candidates  which  can  then  be  tested  for  maximally  later. 
The  function  all-representatives  takes  as  arguments  a  function  vbme  implementing 
the  verification  with  exclusion  algorithm  and  a  list  of  all  possible  pairings.  The  function 
complement  complements  a  list  of  pairings  with  respect  to  the  list  of  all  pairings  pairings. 
Other  functions  will  be  explained  in  the  proof  of  the  correctness  and  polynomial  time 
complexity  of  the  algorithm.  The  pruning  method  used  by  the  algorithm  is  illustrated 
schematically  in  Figure  2. 

The  pruning  step  based  on  the  adjoint  list  of  pairings  is  relatively  low  cost,  and  it  can 
oniy  reduce  the  number  of  nodes  expanded  during  the  search.  It  turns  out  that  under 
weak  assumptions,  this  pruning  step  is  actually  sufficient  to  guarantee  polynomial  time 
behavior  of  a  correspondence  search  algorithm.  Below,  we  will  present  a  proof  of  this  fact 
based  on  the  geometrical  view  of  transformation  space  described  in.  for  example.  Crimson 
and  Huttenlocher.  1989.  and  Cass.  1990.  which  makes  it  easy  to  relate  the  results  about 
the  complexity  of  the  PCS  algorithm  to  transformation  space  based  methods'5  (sweeps 
and  sampling  methods). 

Consider  a  pairing  between  a  model  point  m,,  an  image  point  br  and  a  set  E}  specifying 
the  error  bounds.  Corresponding  to  this  choice  is  a  set  of  transformations  TtJ  =  {T  : 
T  m,  £  bj  +  Ej),  i.e.,  the  set  of  transformations  that  will  map  the  image  point  to  within 
the  error  bound  of  a  prototype  model6. 

The  set  C  of  all  intersections  and  unions  of  the  Tt]  forms  a  lattice  under  inclusion.  The 
set  of  lower  bounds  of  C  —  {0}  form  a  PARTITION  V  of  transformation  space'.  We  will 
refer  to  the  individual  elements  of  V  as  CELLS.  A  REPRESENTATIVE  for  a  cell  is  a  set  of 

4See  Rees  and  Clinger,  1986.  for  a  description  of  the  Scheme  programming  language.  The  functions 
remove- if -not  and  some  are  defined  like  their  Common  Lisp  counterparts.  The  functions  pairing-from 
and  pairing-to  return  some  unique  label  for  the  image  or  model  point,  respectively,  that  constitute  the 
pairing. 

"An  alternative  proof  of  the  polynomial  time  complexity  of  the  PCS  algorithm  based  on  image  space 
is  possible  and  yields  sharper  worst  case  complexity  bounds  for  important  special  cases  of  recognition 
under  the  bounded  error  model. 

’’To  avoid  discussion  of  some  unimportant  special  rases,  we  will  assume  that  different  pairings  give 
rise  to  noil-identical  constraints  in  transformation  space;  i.e.,  <  ^  i'  V  j  -p  j'  =>  T,j  T,' 

'Without  loss  of  generality,  we  assume  that  T  =  (Ji;  7];  The  elements  of  this  partition  are  unions  of 
the  elements  of  the  arrangement  (see  Edelsbrunner,  1987)  generated  by  the  hyperplanes  containing  the 
faces  of  the  convex  polyhedra  in  transformation  space. 


pairings  P  such  that  the  set  of  transformations  compatible  with  all  the  pairings  in  P  is 
just  the  transformations  in  the  cell. 

For  example,  in  the  case  studied  by  Baird,  the  individual  constraints  correspond  to  half¬ 
spaces  in  transformation  space;  each  7^  consists  of  the  intersection  of  a  number  of  such 
halfspaces  (i.e. ,  is  itself  a  convex  polyhedron  in  transformation  space).  In  the  case  studied 
by  Cass,  each  can  be  represented  by  a  generalized  cylindrical  shape  in8  IR2  x  C1. 

In  general,  the  size  of  V  (i.e.,  *V )  could  be  exponential  in  the  number  of  image  and  model 
features.  However,  for  many  classes  of  transfoimations  and  image/ model  spaces,  is 
only  polynomial  in  size.  We  will  examine  conditions  for  this  later;  in  the  case  of  linear 
constraints  (convex  polygonal  error  bounds),  V  is  composed  of  unions  of  elements  of  the 
arrangement  of  hyperplanes  corresponding  to  the  linear  constraints  on  feature  locations. 

We  need  some  additional  notation  for  the  proof.  Imagine  that  there  exists  a  function 
con$istent(T,  P)  that  returns  tme  iff  the  transformation  T  is  consistent  with  every  pairing 
p  €  P.  P,  Q,  R  will  refer  to  sets  of  pairings  (some  matching),  and  A,  B,  C  refer  to  sets 
of  adjoint  pairings,  i.e.,  represent  regions  of  transformation  space  excluded  from  further 
search.  Assume  that  all  possible  pairings  are  numbered  and  list  them  as  {<71, ... ,  qn).  Also, 
we  introduce  the  following  abbreviations  for  certain  subsets  of  transformation  space; 

Definition  4 

.  T(P)  —  {T  :  Vp  €  P  :  consi$tent(T.  {p})} 

T{P,A)  =  {T  :  Vp  €  P  :  consistent{T,  {p}),Va  €  A  :  -<consistent(T,  {a})} 

T>?  =  T{{pi},{pu...,pi-i})  where  P  =  pi, . . .  ,P; 

Now,  let  us  examine  the  algorithm  in  more  detail  (refer  to  Figure  3).  In  the  discussion  of 
the  algorithm,  we  will  have  to  refer  to  functions  in  the  Scheme  code;  typewriter  font  and 
standard  mathematical  notation  will  be  used  in  such  cases:  e.g.,  the  Scheme  call  (leaf 
p)  will  be  written  as  leaf(P)  in  proofs. 

Let  us  first  consider  some  simple  lemmas: 

Lemma  1  The  sets  T{P,  A)  and  V? ,  where  P  =  {pu. . .  ,pn}  are  given  by: 

T(P,A)=  fl  %  ~  U  Ti  ancl  =  T(Pi)  “ 

P.eP  7, €.4  J*1 

Proof.  Follows  directly  from  the  definition.  □ 

*C'1  is  the  circle. 


Figure  2:  This  figure  illustrates  schematically  how  the  search  tree  for  a  maximal  matching 
is  pruned.  The  polygon  indicates  th*  rc^iv,n  in  transformation  space  that  is  compatible 
with  the  set  of  pairings  Px  at  Node  I  and  not  excluded  by  the  adjoint  constraints  Ni  at 
the  same  node. 

The  region  of  transformation  space  that  a  particular  node  is  responsible  for  exploring  is  shown 
in  dark  grey.  Node  2  is  responsible  for  exploring  the  region  where  all  constrain  ts  in  Pj  U  {71} 
are  satisfied.  Node  3  does  not  need  to  explore  this  region  anymore:  therefore.  71  has  been  added 
to  the  adjoint  constraints  when  Node  3  is  expanded. 

The  search  tree  starting  at  Node  4  can  be  pruned  in  this  example,  because  all  the  matchings 
containing  73  will  already  have  been  explored  by  nodes  2  and  3.  Notice  that  this  is  a  geometrical 
fact,  not  a  combinatorial  fact— if  7.3  had  been  a  little  higher  so  that  it  included  some  of  the 
unexplored-white-region,  this  branch  could  not  have  been  pruned  from  the  search  tree. 


') 


(define  (all-representatives  vbme  pairings) 

; ;  all  the  elements  in  "pairings"  that  axe  not  in  "x" 

(define  (complement  x) 

(remove-if -not  (lambda  (y)  (not  (member  y  p)))  pairings)) 

; ;  return  a  list  of  the  form 

;;  ( ( (ql . P )  A)  ((q2.P)  (ql.A))  ((q3.F)  (ql  q2.A))  ...) 

(define  (successors  p  a) 

(define  (successors  1  q) 

(if  (null  q)  ’() 

(cons  (list  (cons  (car  q)  p)  (append  (cdr  q)  a)) 

(successorsl  (cdr  q))))) 

(remove-if -not  (lambda  (x)  (apply  vmbe  x)) 

(successor-si  (complement  x)))) 

;  ;  der-’rmine  whether  "p"  is  a  feasible  matching  and  has  no  successors 
(dei^ne  (leaf  p) 

(and  (vbme  p  '()) 

(not  (some  (lambda  (x)  (vbme  (cons  x  p)  ’()))  pairings)))) 

; ;  recursively  compute  representatives  for  all  leaves 
; ;  that  are  candidates  for  being  maximal  matchings 
(define  (representatives  p  a) 

(if  (leaf  p)  (list  p) 

(reduce  append 

(map  (lambda  (x)  (apply  representatives!  x)) 

(successors  p  a))))) 

(representatives  ’()  ’())) 

Figure  3:  The  pruned  search  algorithm.  See  the  text  for  an  explanation. 


The  idea  is  that  the  Pf  partition  the  part  of  transformation  space  that  is  of  interest 
at  any  particular  node  during  the  search,  and  that  these  sets  can  be  represented  in  the 

algorithm  concisely  and  efficiently  by  a  pairing  p,  and  a  list  ol  adjoint  pairings  px . p,_x 

(in  the  algorithm  itself,  this  is  a  pair  consisting  of  two  lists.  ((/>,)  (p\  ■■■  p,_i)).  We 
will  therefore  refer  to  calling  representatives! P.  .4)  as  calling  representatives  "on 
the  set"  represented  by  the  transformations  compatible  with  the  pairings  P  and  adjoint 
pairings  .4. 

Now.  some  more  trivial  lemmas: 

Lemma  2  The  sets  DF  partition  the  space  of  transformations  accessible  from  the  pairings 
‘n  P  =  {pi . Pn  }  • 

UT(p,)  =  U^r  =>z>f  npf  =  {} 

t  i 

representatives!  P.  A)  returns  only  leaves: 

VQ  €  representatives! P,  .4)  :  leaf  (Q)  =  true 
A  leaf  cannot  be  divided  further  by  another  pairing:  a  leaf  is  just  a  representation,  in  terms 
of  a  largest  set  of  pairings,  of  a  cell  of  the  partition  V  of  transformation  space  induced  by 
the  individual  pairings: 

(leaf  (P)  =  true)  A  [T(P.  {})  f)  T,  ^  0)  =>  T(P.{})n7;  =  T(P.0) 

Proof.  Left  to  the  reader.  □ 

Lemma  3 

{T(P)  :  P  €  representatives! {},  {} )}  =  {T[Q)  :  leaf(Q)} 

Proof.  What  we  are  trying  to  show  here  is  that  representatives  will  return  a  represen¬ 
tative  for  every  leaf  in  transformation  space9. 

Since  representatives  only  returns  leaves.  ’kC'1  has  been  established. 

To  prove  "2  •  we  will  demonstrate  that  if  leaf(<3)  is  true  and  T[Q.  .4)  QT{P.  .4)  then 
representatives!  P.  ,4)  will  return  a  list  containing  a  representative  of  T{Q).  This  will 
prove,  in  particular,  that  representatives! { },  {})  must  return  all  P.  as  required,  since 
the  empty  set  of  pairings  and  the  empty  set  of  adjoint  constraints  correspond  to  all  of 
transformation  space. 

If  P  is  itself  a  leaf,  then  T(P)  =  T(Q)  and  we  are  done. 

c'Tli**  statement  of  the  lemma  uses  the  cell  T(P)  in  !ran>formatiou  space  to  emphasize  that  the 
representation  of  the  matchings  P  and  Q  as  sets  or  lists  of  pairings  does  not.  affect  the  result. 


Otherwise,  consider  the  set  R  =  { r t . /•„}  of  successor  pairings  at  a  search  node,  if 

we  use  no  pruning,  the  search  at  this  point  would  explore  the  intersection  T'  ol  the  set 
of  currently  feasible  transformations  T(P.A)  with  the  set  of  transformations  that  are 
accessible  by  the  pairings  in  R.  But  the  sets  D'k  =  T'  C\Vk  form  a  partition  ot  the  set  T'. 
by  Lemma  2.  Thus,  there  must  be  a  T>'k  such  that  T(Q)  C  Dk.  But  representatives  will 
be  called  recursively  on  every  one  of  the  sets  ’V'k ,  represented  by  the  pair  of  lists  ((r,.P) 
(rt  ■■■  r,_i.A))  and  returned  by  the  function  successors:  representatives  returns 
the  union  of  the  result.  By  induction,  we  know  that  representatives  will  return  a  list 
containing  some  representative  Q‘  such  that  T[Q)  =  T(Q')  when  called  on  the  set  D'k.  □ 

Lemma  4 

V(?i.  Q2  €  representatives! P.  .4)  :  T(Q\)  fl  T(Q2)  =  {} 

Proof.  If  P  is  a  leaf,  representatives  can  only  return  the  one  element  list  (P).  Oth¬ 
erwise,  representatives  will  call  itself  recursively  on  the  disjoint  sets  Vk  Pi  T(P.A). 
But  for  every  Q  €  representatives! P.  A),  T(Q)  C  T(P,A).  Therefore,  invocations 
of  representatives  on  disjoint  sets  can  never  return  representatives  Q\.  Q2  such  that 

r(Ql)nT(Q2)^{}.  □ 

Lemma  5  If  P  and  Q  are.  both  leaves  and  representatives  for  the  same  cell  in  V.  then  P 
and  Q  contain  the  same  pairings,  and  have  the  same  size. 

Proof.  Assume  the  lemma  is  false,  and  that  there  exists  some  pairing  q  that  is  in  Q  but 
not  in  P.  But  since  the  cell  in  transformation  space  that  represents  both  P  and  Q  is  not 
divided  any  further,  q  must  be  consistent  with  all  the  pairings  in  P.  This  means  that  we 
could  add  q  to  P.  This  is  in  contradiction  to  the  assumption  that  P  is  a  leal.  □ 

Lemma  6  The  unpruned  depth-first  search  tree  will  return  every  matching  Q  such  that 
leaf [Q)  is  true. 

Proof.  ("Unpruned"  here  refers  to  the  version  of  the  algorithm  that  leaves  the  adjoint 
constraints  empty  at  all  times.)  This  statement  is  essentially  equivalent  to  the  correctness 
of  the  algorithm  described  in  Baird,  1985.  □ 

Theorem  1  If  roast  mints  partition  transformation  space  into  a  polynomial  number  oj 
cells  and  the  function  vbme  runs  in  polynomial  time,  then  all-representatives  runs  in 
polynomial  firm. 


Proof.  Let  [.  he  the  list  returned  by  representatives.  Elements  of  L  correspond  to  dis¬ 
joint  cells  of  transformation  space  (Lemma  4):  therefore  length!  L)  must  be  smaller  than 
the  total  number  of  cells  in  P.  which  is  polynomial  in  the  problem  size,  by  assumption. 
But  every  leaf  in  the  search  tree  contributes  an  element  to  L.  Therefore,  the  width  of  the 
search  tree  must  be  polynomial.  It  is  obvious  that  the  depth  of  the  search  tree  can  be  at 
most,  length! pairings).  The  computation  at  each  node  in  the  search  tree  also  requires 
only  polynomial  time. 

The  correctness  of  the  algorithm,  i.e..  the  fact  that  it  returns  representatives  for  all  the 
cells  for  which  an  unpruned  depth  first  search  tree  would  have  returned  a  representative, 
has  been  established  in  Lemmas  3.  5.  and  6.  □ 

The  function  all-representatives  does  not  exactly  solve  the  MBM  problem  itself, 
though:  the  definition  of  MBM  requires  that  any  image  or  model  feature  is  used 
only  once  within  a  matching,  but  an  element  in  the  list  of  pairings  returned  by 
all-representatives  can  contain  several  pairings  that  use  the  same  image  or  model 
point. 

To  see  how  we  can  satisfy  these  additional  combinatorial  constraints,  we  introduce  some 
additional  concepts. 

Lemma  7  Every  feasible  matching  P  is  the  subset  of  some  representative  Q  returned  by 
all-representatives. 

ProoJ.  Because  of  Lemma  5,  it  suffices  to  show  that  every  feasible  matching  is  the  subset 
of  some  representative  of  any  leaf.  But  this  is  trivial:  either  Q  is  itself  a  leaf,  or  we  can 
add  some  pairings  to  it  to  arrive  at  a  leaf  (since  we  only  have  a  finite  number  of  pairings 
to  add).  □ 

The  combinatorial  constraint  that  no  image  or  model  feature  may  be  used  twice  in  a 
matching  can  be  expressed  as  a  maximal  bipartite  graph  matching  problem  between  image 
and  model  features  (see  also  Cass.  1988).  Let  P  be  a  set  of  pairings.  Let  the  two  vertex 
sets  of  the  bipartite  graph  are  given  by  the  sets  of  image  points  in  P  and  the  set  of  model 
points  in  P.  and  the  edges  are  given  by  the  pairings  contained  in  P.  A  largest  subset  of 
pairings  that  does  not  re-use  any  image  or  model  points  is  a  maximal  bipartite  matching 
and  can  be  computed  from  P  in  low-order  polynomial  time  complexity  ( Papadimitriou 
and  Steiglitz.  1982). 

Csing  these  observations,  we  can  state  the  following  theorem. 


Theorem  2  IJ  constraints  partition  transformation  space  into  <i  polynomial  number  of 
cells 


VBME  €  P  =►  MBM  6  P 
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Proof.  By  Lemma  7.  the  optimal  matching  is  a  subset  of  one  of  the  matchings  returned  by 
all-representatives.  VVecan  run  the  maximal  bipartite  graph  matching  algorithm  over 
all  the  representatives  returned  by  all-representatives  to  find  the  optimal  matching 
consistent  with  the  combinatorial  constraints.  □ 

Imposing  the  combinatorial  constraint  of  using  each  image  or  model  feature  only  once  is 
rarely  necessary  in  practice,  and  often  undesirable.  Other  combinatorial  constraints  and 
quality  measures  are  of  considerable  practical  interest,  such  as  allowing  multiple  image 
features  to  match  a  single  object  feature,  or  use  model  boundary  length  accounted  for  in 
the  image  instead  of  the  size  of  the  matching.  In  fact,  allowing  multiple  matches  among 
straight  line  segments  gives  us  a  simple  method  to  deal  with  edge  fragmentation  and 
partial  occlusions  of  extended  features. 

The  following,  much  simpler,  theorem  is  also  of  some  interest. 

Theorem  3 


\IBM  €  P  =>  VBM  6  P 

Proof.  MBM  will  return  a  maximal  matching  (our  implementation  actually  can  return 
representatives  for  all  maximal  matchings).  It  turns  out  that  we  only  need  a  simpler 
algorithm  MBM'  that  returns  the  size  of  a  maximal  matching:  if  P  is  compatible  with 
some  transformation,  then  it  is  certainly  necessary  that  MBM'(5..W)  returns  k.  where 

B  =  {b\  +  Ei . bk  +  Ek}  and  M  =  {mi, . . . ,  m*},  and  that  MBM'(5(.  M,j.  obtained 

by  deleting  bt  from  B  and  A/,  from  M..  contains  a  matching  of  size  k  —  1.  This  condition 
also  turns  out  to  be  sufficient.  □ 


3  Polyhedral  Error  Bounds 

Let  us  now  consider  specific  instances  of  the  Maximal  Bounded  Matching  (MBM)  problem. 

We  will  see  that  convex  polygonal  error  bounds  (linear  constraints)  on  the  position  of 
image  or  model  points  give  rise  to  bounds  that  form  a  convex  polyhedron  in  transformation 
space  for  2D  rigid  motions  or  higher  dimensional  affine  transformations10. 

Let  M  be  a  finite  subset  of  IR  ‘V  .  Let  the  image  points  B  and  error  bounds  €  =  {£, }  be 
points  and  subsets  of  IR'V.  respectively,  where  each  subset  E,  is  determined  by  a  set  of 

1 ''Baird.  11)85.  gives  a  slightly  different  derivation  of  this  fact  for  the  2D  ease  and  states  the  result  in 
the  general  case  without  proof.  Essentially  equivalent  is  the  independent  result  of  U  liman  and  Basri. 
1989:  they  use  the  observation  that  the  relation  between  constraints  induced  by  correspondences  and  the 
parameters  of  the  viewing  transformation  is  linear  for  testing  for  the  consistency  of  several  2D  images  of 
3D  objects  without  explicitly  computing  a  3D  transformation. 
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linear  constraints  {(</,*..  as  follows: 

Et  =  { c  :  ttlfc  ■  ( r  —  m,)  <  c/,* }  where  u,t  €  IRA  .  dlk  €  1R 

In  different  words,  each  image  point  with  error  bounds  is  a  convex  polyhedron  (given  by 
the  set  6,  +  £,)  whose  faces  are  determined  by  the  ( u,*.,  d^).  Let  the  set  of  transformations 
T  be  the  set  of  all  transformations  T  —  (t.  R).  where  t  is  a  translation  (vector),  and  R  is  a 
linear  subspace  of  the  space  of  .V  x  XI  matrices.  Let  I\  be  the  total  number  of  constraints, 
i.e..  the  sum  of  the  number  of  faces  of  each  constraint  polygon. 

If  we  pair  a  point  b  from  B  with  a  point  m  from  M  under  the  polyhedral  error  bounds 
E.  it  is  easy  to  see  that  this  pairing  implies  a  collection  of  linear  constraints  on  the 
matching  transformation  T.  Let  (u,d)  be  the  parameters  determining  one  of  the  faces  of 
E.  Requiring  that  T  maps  m  into  b  +  E  is  then  the  same  as  requiring  (for  all  u  and  d) 
that: 

u  ■  (Rm  +  t  —  b)  <  d 

But  we  can  rewrite  this  as: 

d  +  u  ■  b  >  u  ■  (Rm  +  t) 

=  u  ■  t  +  u  ■  Rm 

k  i  ; 

=  ivc  +  yyuw/f,,) 

k  IJ 

The  last  expression  is  formally  like  a  dot  product  if  we  consider  the  vectors  C(u.b)  = 

[(u*)a:=i .V.  (ti‘^),=i V:j= i ,v/|  and  T  =  [(<Ar)fc=l jv,(£u),=i a/]  elements  of 

an  ,V  +  .V  M  dimensional  vector  space.  Observe  that  the  R  form  a  linear  vector  space-a 
subspace  of  IR  VU -themselves  when  considered  elementwise.  Thus,  corresponding  to  the 
linear  constraint  associated  with  the  pairing  (6, m) 

u  •  (Rm  +  t  —  b)  <  d 

is  the  linear  constraint  on  T  =  (f.  R)  given  by 

C(u.b)  ■  t  <  d  +  u  ■  m 

This  derivation  goes  through  essentially  unchanged  if  we  impose  error  bounds  on  the 
image  points  rather  than  the  model  points;  it  simply  reverses  the  roles  of  b  and  m.  and 
the  last  equation  reads: 

( '( u.  m)  •  T  <  d  +  u  •  b 
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Sometimes  we  may  also  only  have  constraints  on  the  orientation  rather  than  the  location  of 
a  features.  If  we  represent  the  orientation  by  a  unit  vector  o.  we  observe  that  a  constraint 
on  the  orientation  can  also  be  written  as  a  linear  constraint  on  o  as: 

u  ■  Ro  <  d 

This  can  again  be  converted  into  a  linear  constraint  on  T  =  (t.R)  since  the  inequality 
can  be  considered  a  formal  dot  product: 

Eo  ktkJ2ulojR>J<d 

k  ij 

The  algorithm  VBME  that  we  need  in  order  to  apply  the  search  algorithm  from  the 
previous  section  is  now  simply  linear  programming  in  an  .V  +  N M  dimensional  space 
given  I\  constraints  (the  total  number  of  sides  of  all  constraint  polygons).  We  also  see 
immediately  that  the  space  of  transformations  is  partitioned  into  at  most  0(/\  v  +  vu) 
cells  through  the  application  of  all  constraints. 

Thus,  we  see  that  the  pruned  correspondence  search  (PCS)  algorithm  is  a  polynomial  time 
algorithm  for  the  the  Maximal  Bounded  Matching  problem  between  points  in  B  C  |RA 
and  M  C  IR  v/.  where 

•  the  set  of  transformations  is  given  by  b  =  t.  +  Rm.  t  €  IR  v/ 

•  the  R  form  a  linear  vector  space 

•  constraints  are  given  by  polvhedra  associated  with  each  point  either  in  B  or  M 

Essentially  the  same  derivation  goes  through  in  the  special  case  where  we  represent  rota¬ 
tions  and  scaling  in  the  plane  by  complex  numbers,  yielding  a  polynomial  time  algorithm 
for  that  case  as  well. 

4  2D  Recognition  from  2D  Images 

It  would  be  nice  to  be  able  to  say  more  about  the  complexity  of  the  recognition  algorithm 
in  specific  cases.  We  will  need  to  make  use  of  the  following  theorem: 

Theorem  4  (Megiddo,  1984)  For  any  fixed  dimension,  the  linear  progmmming  prob¬ 
lem  is  solvable  in  linear  time.. 

This  immediately  lets  us  infer  the  following  theorem: 
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Figure  4:  A  matching  problem.  The  edges  oi'  all  the  images  were  extracted  from  a 
grey  level  image  using  an  implementation  of  the  Canny  edge  detector  and  approximated 
by  straight  line  segments  using  a  splitting  algorithm.  Shown  on  the  left  are  the  edges 
extracted  a  cluttered  scene  with  12  widgets,  on  the  right,  the  edges  of  the  model  widget. 
The  recognition  algorithm  has  to  find  the  best  match  (in  terms  of  model  boundary  length 
accounted  for  in  the  image)  to  the  object  on  the  right,  allowing  for  translations,  rotations, 
scaling,  and  occlusions.  Model  and  image  features  were  required  to  match  within  an  error 
of  o  pixels. 

Theorem  5  Verification  of  a  matching  between  a  2D  image  and  a  2D  model  under  affine 
transformations  or  rigid  transformations/scale,  or  a  2D  image  and  a  2D  model  under 
affine  transformations  and  projection,  can  be  carried  out  in  0{n)  time,  where  n  is  the 
total  number  of  points. 

The  problem  VBME  in  the  2D  case  and  in  the  3D  case  with  affine  transformations  is 
linear  in  both  the  number  of  pairings  in  the  matching  to  be  verified  and  on  the  adjoint 
list.  VBM/VMBE  in  the  3D  case  with  rigid  transformations  is  slightly  harder  in  the  worst 
case  because  of  the  need  to  intersect  a  quadratic  surface  with  the  polyhedron  formed  by 
the  feasible  transformations  in  transformation  space  (see  below). 

In  the  following  theorems,  for  simplicity  let  n  be  the  maximum  of  *G  and  *M.  and 
assume  that,  there  is  a  fixed  maximum  number  of  linear  constraints  associated  with  each 
model  point. 


Theorem  6  The  worst  case  running  time  oj  the  PCS  algorithm  applied  to  the  problem 
of  2D  recognition  from  2D  images  under  rigid  transformation/scale  is  is  Ola1"*). 


The  bound  on  the  worst  case  running  time  follows  from  t lie  fact  that  transformation  space 
is  divided  into  at  most  0(ns)  cells  by  the  individual  constraints:  there  are  0{n2)  possible 
pairings,  and  any  k  linear  constraints  partition  the  four  dimensional  transformation  space 
into  at  most  0(kA)  cells11.  Each  linear  programming  problem  is  at  most  of  size  O(n). 
and  the  adjoint  list  is  at  most  of  size  0{n2).  giving  a  bound  on  the  worst  case  running 
time  of  0( n 1 1 ) .  The  bipartite  graph  matching  costs  at  most  0(n2  5).  but  we  already  have 
counted  in  O(n)  time  for  the  verification  of  each  search  node  via  linear  programming.  □ 

This  bound  is  very  loose.  First,  the  algorithm  may  actually  only  explore  a  fraction  of 
the  cells  in  transformation  space.  Furthermore,  both  the  linear  programming  algorithm 
and  the  maximal  bipartite  matching  algorithm  can  be  interleaved  with  the  search  so  that 
partial  results  are  taken  advantage  of  at  deeper  levels:  such  techniques  are  already  known 
to  reduce  the  overall  complexity  of  the  algorithm  further  in  some  cases  (Baird.  1380). 

Notice  that  any  transformation  space  sweep  algorithm  that  enumerates  all  cells  for  the 
same  problem  has  worst  case  and  average  case  complexity  of  at  least  Q(n9).  since  the 
number  of  cells  in  the  arrangement  formed  by  the  linear  constraints  on  feature  locations 
has  size  Q(n8)  in  the  worst  and  average  case,  and  the  cost  of  picking  a  transformation 
and  computing  a  corresponding  matching  is  fi(n). 

In  order  to  be  able  to  say  anything  about  the  average  case  performance  of  the  PCS 
algorithm,  observe  that  the  adjoint-based  pruning  step  in  the  PCS  algorithm  incurs  at 
most  an  additional  overhead  of  0(n2)  over  any  analogous  correspondence  based  algorithm. 
This  observation  lets  us  carry  over  average  case  analyses  such  as  the  ones  found  in  Baird, 
1985,  Crimson,  198S,  and  Grimson,  1989,  to  the  corresponding  pruned  algorithm. 


5  3D  Recognition  from  2D  Images 

For  the  recognition  of  rigid  3D  objects  from  2D  images,  the  restrictions  on  the  matrices 
R  cannot  be  expressed  in  linear  form  anymore.  The  VBME  algorithm  now  requires  the 
simultaneous  solution  of  linear  inequalities  together  with  a  set  of  quadratic  equalities 
whose  number  is  independent  of  problem  size.  One  approach  is  to  embed  the  manifold 
formed  by  all  rotations  and  scaling  matrices  in  a  linear  space,  solve  the  verification  problem 
in  the  larger  space  using  linear  programming,  and  then  determine  whether  the  intersection 
between  the  manifold  and  the  polyhedron  containing  the  feasible  transformations  is  non- 

11  This  fact  can  be  demonstrated  either  algebraically  or  inferred  using  the  Vapnik  Chervonenkis  (VC) 
dimension. 


IS 


Figure  5:  The  solution  to  the  matching  problem.  The  figure  shows  an  optimal  transforma¬ 
tion  of  the  model  onto  the  image  obtained  using  the  PCS  algorithm.  Previous  algorithms, 
such  as  heuristic  search  termination  or  incompletely  pruned  algorithms,  either  did  not 
find  the  optimal  solution,  or  could  not  be  run  to  completion  because  of  the  combinatorial 
explosion  of  the  number  of  possible  matchings. 


empty.  The  cost  of  this  intersection  test  is  bounded  by  0(kd ),  where  d  is  the  dimension 
of  the  space  containing  the  rotations,  and  k  is  the  number  of  linear  constraints  forming 
the  surface  of  the  polyhedron.  Although  k  can  in  principle  be  as  large  as  n 2.  it  can  be 
large  only  very  rarely,  and  we  will  therefore  assume  that  the  cost  of  the  intersection  test 
is  constant  in  an  amortized  analysis. 

Putting  all  these  results  together,  we  can  bound  the  worst  case  complexity  of  the  PCS 
algorithm  when  applied  to  3D  recognition.  Transformation  space  in  this  case  is  five  di¬ 
mensional  ( IR2  x  R3.  where  IR2  is  the  two  dimensional  Euclidean  space  of  translations,  and 
R 3  is  the  three  dimensional  manifold  of  rotations),  meaning  that  the  0{n2)  constraints 
can  give  rise  to  at  most  0(n10)  cells.  Taking  into  account  the  0{n )  cost  for  linear  pro¬ 
gramming,  the  0{n2)  cost  for  maintaining  and  testing  the  adjoint  list,  and  the  0(n2a) 
cost  for  the  maximal  bipartite  graph  matching  this  yields  a  bound  of  0(nu-5)  on  the 
worst  case  time  complexity  of  the  algorithm  for  3D  recognition  from  2D  data  under  rigid 
transformations  and  convex  polygonal  error  bounds.  For  the  same  reasons  as  in  the  2D 
case,  this  is  only  a  loose  bound. 
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In  the  average  case,  again,  assuming  constant  time  for  the  intersection  test,  the  algorithm 
will  perform  at  most  Of/r)  worse  than  any  existing  correspondence  algorithm.  To  see 
what  average  case  complexity  this  works  out  to  in  practice,  consider  the  case  of  recognition 
from  2D  features  with  location  and  orientation.  In  the  complete  absence  of  error,  we  have 
to  consider  0(n  *)  pairs  of  image  and  model  features  and  solve  a  set  of  linear  inequalities  in 
time  0(rt )  in  order  to  find  a  maximal  match’ng  (this  is  simply  the  complexity  of  alignment 
algorithms).  The  same  considerations  carry  over  to  an  average  case  analysis  for  sufficiently 
small  error  bounds.  The  additional  overhead  of  the  PCS  algorithm  is  0(n2).  giving  a  loose 
bound  on  the  total  average  case  complexity  of  0(n' )  for  3D  object  recognition  from  2D 
images  under  the  above  assumptions. 


6  Discussion 

We  have  seen  that  under  weak  assumptions  the  Maximal  Bounded  Matching  (MBM)  prob¬ 
lem  is  at  most  polvnomially  harder  than  the  corresponding  verification  problem  VBME. 
and.  conversely,  that  a  polynomial  time  algorithm  for  MBM  implies  the  existence  of  a 
polynomial  time  algorithm  for  the  verification  problem  VBM. 

There  are  three  important  properties  of  the  recognition  problem  as  defined  above  that 
make  it  possible  to  find  tractable  algorithms  to  solve  it: 

1.  The  bounded  error  model  makes  the  problem  discrete. 

2.  The  ‘‘quality  measure’’  (size  of  a  matching)  is  invariant  under  permutations  of  image 
or  model  features  within  a  matching. 

3.  Transformation  space  has  a  fixed,  bounded  dimensionality,  and  error  bounds  on 
image  or  model  features  induce  subsets  in  tranformation  space  that  have  a  simple 
structure. 

All  three  of  these  properties  are  needed.  For  example,  for  a  quality  measure  whose 
dependence  on  the  matching  itself  is  complicated,  there  is  obviously  no  reason  to  expect 
polynomial  time  performance  of  a  recognition  algorithm:  as  we  have  observed  already 
above,  even  given  a  single  transformation  that  aligns  n  image  and  model  points,  the 
number  of  possible  matchings  that  are  geometrically  consistent  with  the  transformation 
can  be  as  large  as  nl.  and  a  general  quality  measure  could  assign  arbitrary  values  to  each 
of  the  possible  matchings.  Or.  if  we  do  not  fix  the  dimensionality  of  transformation  space 
(as  in  recognition-with-parts  problems,  where  the  number  of  possible  parts  is  unlimited), 
a  small  number  of  location  constraints  can  give  rise  to  a  large  number  of  distinct  and 
geometrically  feasible  matchings. 
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These  properties  of  the  recognition  problem  have  also  been  taken  advantage  of  in  the 
transformation  space  sweep  (TSS)  algorithms  for  recognition  described  by  Cass.  1990. 
An  important  difference  between  such  algorithms  and  the  correspondence  based  algorithm 
presented  here  is  that  the  partition  of  transformation  space  used  in  the  TSS  algorithm, 
the  arrangement  induced  by  considering  all  location  constraints  simultaneously,  is  finer 
than  actually  needed  for  finding  a  maximal  matching. 

For  example,  each  individual  location  constraint  may  give  rise  to  a  number  of  disconnected 
sets  in  transformation  space  (as  it  does  in  the  case  of  3D  recognition  under  rigid  trans¬ 
formations).  A  straightforward  application  of  the  TSS  algorithm  has  to  enumerate  and 
test  separately  each  of  the  disconnected  components  of  such  sets.  In  contrast,  the  PCS 
algorithm  only  relies  on  the  existence  of  a  verification  algorithm  to  determine  whether  the 
intersection  of  sets  of  transformation  space  given  as  sets  of  pairings  are  non-empty:  the 
topological  properties  of  the  sets  of  feasible  transformations  involved  do  not  matter  (and 
sets  with  complicated  topology  can  still  be  comparatively  easy  to  test  for  intersection 
with  one  another). 

Another  important  distinction  between  the  PCS  algorithm  and  transformation  space 
sweep  algorithms  is  that  PCS  makes  direct  use  of  combinatorial  constraints.  This  means 
that  the  PCS  algorithm  only  computes  a  subset  of  those  representatives  that  are  actually 
needed  for  finding  a  maximal  matching. 

When  compared  with  previous  complexity  results  for  correspondence  search  techniques 
in  vision,  the  existence  of  the  PCS  algorithm  demonstrates  that  the  absence  of  unique 
feature  labels  and  the  presence  of  clutter  and  occlusions  in  visual  recognition  does  not 
necessarily  lead  to  an  exponential  behavior  for  correspondence  search  algorithms  as  has 
been  conjectured  by  Crimson,  1989. 

Another  important  difference  between  correspondence  search  algorithms  as  described  by 
Crimson  and  Lozano-Perez,  1983,  and  the  algorithm  described  in  this  paper  is  that  their 
algorithm  has  expected  polynomial  time  behavior  only  if  a  match  is  known  to  exist  in 
between  the  image  and  the  model;  the  time  required  to  determine  absence  of  a  sufficiently 
large  match  between  image  and  model  for  the  unpruned  algorithm  requires  exponential 
time  even  in  the  expected  case.  The  PCS  algorithm  described  in  this  paper  has  worst 
case  (and.  hence,  expected)  polynomial  time  complexity  in  both  cases. 

The  pruning  method  employed  by  PCS  is  a  relatively  low-cost  scheme  for  pruning  a 
depth-first  tree  search  for  a  maximal  matching;  for  convex  polygonal  location  constraints, 
in  the  worst  case,  it  will  run  0{mn)  (where  m  is  the  number  of  model  points  and  n  is 
the  number  of  image  points )  slower  than  any  equivalent  correspondence  based  algorithm 
without  the  pruning  step.  In  practice,  we  find  that  the  pruning  step  actually  improves 
the  overall  performance  of  the  algorithm  in  the  case  of  moderate  or  large  error  bounds, 
since  it  avoids  re-exploration  of  transformations  that  have  already  been  tested:  existing. 
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incompletely  pruned  algorithms  will  explore  too  many  search  nodes  when  error  hounds 
are  large  because  of  the  combinatorial  explosion  of  the  number  of  geometrically  feasible 
matchings. 

Existing  heuristics  can  be  incorporated  into  the  PCS  algorithm  to  improve  its  performance 
while  still  guaranteeing  worst-case  polynomial  complexity.  In  particular,  the  technique  of 
Grimson  and  Lozano-Perez.  1983.  uses  a  test  for  pairwise  consistency  to  exclude  impossible 
matchings  quickly,  and  the  same  technique  can  be  incorporated  into  the  pruned  algorithm 
described  in  this  paper.  Furthermore,  all  the  branches  at  a  node  in  the  search  tree  can 
be  explored  in  parallel,  making  the  algorithm  attractive  for  a  parallel  implementation. 

For  the  verification  step,  linear  programming  methods  are  particularly  desirable,  -incc 
they  are  provable  efficient,  and  good  implementations  exist.  If  the  non-linearities  in  the 
verification  step  are  of  a  simple  enough  form  and  fixed  in  number  (as.  for  example,  in  the 
recognition  of  rigid  3D  objects  from  2D  data),  efficient  verification  algorithms  may  still 
be  formulated,  as  described  above. 
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